Abstract

In humans, learning depends on the joint contribution of multiple interacting systems — memory (WM), long-term memory (LTM) and reinforcement learning (RL). The present study aims to understand the relative contributions of these systems during learning as well the specific strategies individuals might rely on. Collins (2018) put forward a working memory-reinforcement learning combined model that addresses this question but it largely ignores long-term memory. We built four ACT-R (single-mechanism RL and LTM, and two integrated RL-LTM, meta-learning RL and parameter RL bias models) idiographic learning models using the Collins (2018) stimulus-response association task. Different models provided best-fits (LTM: 63%, RL: 1%, meta-RL: 12%, bias-RL:21% of participants) for individual learners which suggests that irreducible differences in learning and meta-learning strategies exist within individuals. Models predicted learning accuracy and rate, and testing accuracy for subjects in their respective groups.

Objectives

This report describes the four ACT-R models and the learning outcomes produced by the changes in parameters. The report also describes how these models fit behavioral data and details the properties of the best fitting models and parameters. The specific objectives of this project is to test if the RLWM task can be modeled well by a group of pure and combined declarative and RL learning models. After fitting the models to participant data we aim to extract parameters that may explain why and how learning resulted as observed. If the parameters describe individual differences in learning would the parameters predict other behavioral data like working memory capacity and reinforcement learning accuracy?

ACT-R Models

Below are the 4 ACT-R models tested. Note that the bolded names appear through-out this document.

  • RL: Pure RL model based on learning of production utility in ACT-R. learning rate (alpha) and softmax temperature are the only 2 parameters

  • LTM: A declarative model that solely depends on storage and retrieval of stimuli, response and outcome in ACT-R’s declarative memory. This model depends on decay rate, retrieval noise and

  • meta_RL: This is a combined RL - LTM model. Information about trials performed by the RL system is shared and stored in LTM (declarative) for use. An isolated (meta) RL system (a set of productions) learns and determines which sub-system, RL or LTM, is used throughout learning. Which subsystem is preferred depends on the specific set of parameters.

  • biased: This is a combined RL-LTM model. Information about trials performed by the RL system is not shared with the LTM portion of the model. An additional “strategy” parameters specifies a bias towards the RL model at the 20, 40, 60, and 80 percent of learning and test trials.

Approach

The models are fit to behavioral data and the best-fitting model and set of parameters is selected by comparing BIC. The lowest BIC value determines the winning model. To assess the quality of the fit model and parameters RLWM task learning features were compared to the model outcomes. The features of interest are: - Accuracy at the end of learning (accuracy after 12 stimulus presentations) - Accuracy at test - Change in accuracy from end of learning to test - Learning rate - Differences in the learning trajectories of the two set sizes The expectations and outcomes are described below.

Results

Model fits

Of the four models compared, the LTM model fit the most number of participants (57) followed by the biased version of the combined RL-LTM model (11) and the meta-RL combined model in third place (11). The RL only model had only 4 participant(s) that fit it best (figure 1). This is a slight departure from out expectation that the combined RL-LTM models would fit the majority of participants. As observed, this suggests that most learners simply commit to memory the stimulus response associations.

Figure 1. Counts of fit subjects by model

Figure 1. Counts of fit subjects by model

Within each group (groups formed by preferred model types) of participants, there is only 1 RL best fitting combination of parameter values for the alpha and softmax parameters. For the most popular model, LTM, that fit (57) participants, surprisingly, there were only 14 best fitting parameter-value sets for the spreading activation, retrieval noise and memory decay rate parameters. The biased model was the most diverse at 11 parameter sets for (11) participants. The meta-RL model closely followed the biased model in-terms of diversity of parameter-value sets at 10 parameter-value sets for (11) subjects. Figures 2 and 3 show the medians and ranges of the BIC values that determined that the LTM model is the best fitting model even when only comparing BIC values for the set of parameter-values that fit participants best in each category of models.

Figure 2.

Figure 2.

Figure 2 shows that the LTM model has the lowest BIC values.

Figure 3.

Figure 3.

How consistent are the fits observed above? Given a participants best fit how many of the next best fit parameter sets are in the same model category?

Statistics on where the next best fit occurs for each participant by model
model mean median sd min max
Biased 164.45455 146 153.95932 3 395
LTM 14.40351 9 12.13593 3 53
Meta-RL 131.54545 119 151.15116 2 443
RL 2.00000 2 0.00000 2 2
Figure 4

Figure 4

Large differences in BIC values in the first 2 to 5 are critical to provide good evidence against the second and higher best fit models. Figure 5 below shows the rank ordered differences between consecutive BIC values. The difference is highest between the first two models but the difference falls short of providing strong evidence that the best fit model is preferred over the second best fit model.
Figure 5

Figure 5

These differences might be slightly different when broken apart by model type. Figure 6 below shows that the LTM model has higher difference than the rest of the models meaning any participant that had fit the LTM model best had had more evidence against the second best model fit compared to best-fit models for other participants.
Figure 6

Figure 6

Out of curiosity, how often is the best fit model selected for the same participant? This would tell us whether or not subsequent fits are only due to changes in parameter values.
figure 6

figure 6

These subjects had second best fit models that came from a differnt model group. X1 to X3 are the BIC differences.
subjects X1 X2 X3 model
6217 5.3616978 2.5667977 0.9835599 RL
15001 1.9317027 1.0882666 1.0623723 Meta-RL
15005 2.3491642 0.9896057 0.9199334 Meta-RL
15014 0.2556102 3.2446232 0.2976195 RL
15016 1.9222742 0.9258556 0.3815785 Meta-RL
28306 25.2416031 2.0253609 3.0521664 RL
28328 14.4697705 4.9703081 0.2965980 RL

Assesments of Model fits

Looking at the learning curves for the four models in Figure 4, the differences in learning rates are apparent as are other features like the separation between the two set sizes. In the plot below each data point is the average accuracy, for that number of stimulus presentations, across all parameter combinations. The LTM and RL models predict that an increase in set-size does not diminish learning rate and accuracy. But this analysis washes out the individual differences that could be captured by the diverse set of parameter combinations.

Figure 7.

Figure 7.

The panels in figure 8 show the mean accuracy for participant behavioral data. The model lines are averages across parameters for that group only. As we are aiming for an individual differences look at these data, collapsing across so much of this variability is uninformative, as was shown above in figure 4,especially if the differences, once fit to actual behavioral data, indicate large differences in learning outcomes or cogntive faculty diagnostics like working memory capacity. Here, only the best fitting sets of parameter combinations were selected and collapsed. As can be seen in the figure below, the different model types appear to be vastly different and some charateristics of behavioral data have come through, such as the separations of the learning trajectories for the different setsizes in the RL-LTM Biased model fit. It can also be seen that some paramter sets in the LTM model also capture the diffculty associated with increasing set size (solid lines in Fig. 8B). The LTM participants, on average have the highest accuracies for the testing phase in both set sizes but they are nearly indistinguishable from the meta-RL group for accuracy at end of learning. The biased group shows the most separation between the set size 3 and 6 at learningand also lower accuracy at test than LTM. The biased group is negligibly different from the meta-RL group for set size 3 but shows a marked difference at set size 6, closely following the behavioral data.

Figure 8.

Figure 8.

For reference, the group mean of all 83 subjects is shown in figure 9 below.
Figure 9

Figure 9

Figure 9

Figure 9

There are five outcome measures of interest in the RLWM task: accuracy at the end learning, accuracy at test, learning rate characterized as slope estimate for the first 6 trials, the differences in learning of set 3 and set 6 and also the level of preserved learning at test for both set-sizes (test-learn). The following analyses compare the model data with behavioral data in these outcome measures.

Figure 10 below shows accuracy at end of learning and test. The models closely track the behavioral data. Note that the RL group has only two data points.

2 x 2 setzise by interation(learn vs test) ANOVA table for behavioral data
term df sumsq meansq statistic p.value
setSize 1 0.0000641 0.0000641 0.0048209 0.9446873
iteration 1 1.2908057 1.2908057 97.1443094 0.0000000
setSize:iteration 1 0.1081829 0.1081829 8.1416983 0.0046007
Residuals 328 4.3583024 0.0132875 NA NA
2 x 2 setzise by interation(learn vs test) ANOVA table for model data
term df sumsq meansq statistic p.value
setSize 1 0.0000343 0.0000343 0.0031057 0.9555921
iteration 1 1.8562669 1.8562669 168.2184894 0.0000000
setSize:iteration 1 0.0248099 0.0248099 2.2483246 0.1347210
Residuals 328 3.6194330 0.0110349 NA NA

The models predict learning rate for set size 3 for most of the models (not in the explicit biased model, too few data points in RL to say). But the models predicted learing rate for s6 only in the biased model. See figure 11 below.

mean and median of slope for behavioral data by set-size
setSize mean(estimate) median(estimate)
s3 0.1148164 0.1154762
s6 0.0800440 0.0825397
#> 
#>  Welch Two Sample t-test
#> 
#> data:  estimate by setSize
#> t = 10.149, df = 142.26, p-value < 2.2e-16
#> alternative hypothesis: true difference in means between group s3 and group s6 is not equal to 0
#> 95 percent confidence interval:
#>  0.02799973 0.04154511
#> sample estimates:
#> mean in group s3 mean in group s6 
#>       0.11481641       0.08004399
Descriptive stats of model and behavioral learning rate
setSize type model mean se
s3 behav Biased 0.1024892 0.0055733
s3 behav LTM 0.1161654 0.0021151
s3 behav Meta-RL 0.1181818 0.0062193
s3 behav RL 0.1202381 0.0054771
s3 model Biased 0.0746926 0.0083405
s3 model LTM 0.1068822 0.0008860
s3 model Meta-RL 0.1048312 0.0032958
s3 model RL 0.1247619 0.0000000
s6 behav Biased 0.0525253 0.0100787
s6 behav LTM 0.0840017 0.0029315
s6 behav Meta-RL 0.0763348 0.0054408
s6 behav RL 0.1095238 0.0084179
s6 model Biased 0.0669048 0.0078334
s6 model LTM 0.1042473 0.0007384
s6 model Meta-RL 0.0920563 0.0050243
s6 model RL 0.1327619 0.0000000
#> Analysis of Variance Table
#> 
#> Response: estimate
#>             Df   Sum Sq   Mean Sq F value   Pr(>F)    
#> type         1 0.000802 0.0008018  1.7664   0.1848    
#> model        2 0.030323 0.0151615 33.4032 7.29e-14 ***
#> type:model   2 0.001421 0.0007106  1.5656   0.2106    
#> Residuals  310 0.140707 0.0004539                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Analysis of Variance Table
#> 
#> Response: diff.mean
#>             Df  Sum Sq  Mean Sq F value    Pr(>F)    
#> model        2 0.31826 0.159129 54.5990 < 2.2e-16 ***
#> type         1 0.05096 0.050963 17.4861 4.871e-05 ***
#> model:type   2 0.02021 0.010103  3.4663   0.03372 *  
#> Residuals  152 0.44300 0.002914                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> # A tibble: 0 × 0
#> 
#>  Wilcoxon signed rank test with continuity correction
#> 
#> data:  diff.mean
#> V = 52, p-value = 0.09983
#> alternative hypothesis: true location is not equal to 0
#> 
#>  Wilcoxon signed rank exact test
#> 
#> data:  diff.mean
#> V = 0, p-value = 0.0009766
#> alternative hypothesis: true location is not equal to 0
#> 
#>  Wilcoxon signed rank test with continuity correction
#> 
#> data:  diff.mean
#> V = 56, p-value = 8.088e-10
#> alternative hypothesis: true location is not equal to 0
#> 
#>  Kruskal-Wallis rank sum test
#> 
#> data:  diff.mean and type
#> Kruskal-Wallis chi-squared = 9.7565, df = 1, p-value = 0.001787
#> # A tibble: 3 × 3
#>   group1  group2    p.value
#>   <chr>   <chr>       <dbl>
#> 1 LTM     Biased 0.00000826
#> 2 Meta-RL Biased 0.00230   
#> 3 Meta-RL LTM    0.341
K-W test one way rank-sum test: s6-s3 learning curve differences by model type for behavioral data
statistic p.value parameter method
123.9622 0 2 Kruskal-Wallis rank sum test
K-W test one way rank-sum test: s6-s3 learning curve differences by model type for model data
statistic p.value parameter method
229.5472 0 3 Kruskal-Wallis rank sum test
pairwise post-hoc tests for behav data
group1 group2 p.value
LTM Biased 0.0000000
Meta-RL Biased 0.0000000
Meta-RL LTM 0.1393724
RL Biased 0.0000000
RL LTM 0.6244517
RL Meta-RL 1.0000000
pairwise post-hoc tests for model data
group1 group2 p.value
LTM Biased 0.0000000
Meta-RL Biased 0.0000000
Meta-RL LTM 0.0000000
RL Biased 0.0000000
RL LTM 0.0025631
RL Meta-RL 0.0537204
(2)set-size x (2)type(modelorBehav data) x 3(model) anova. RL excluded.
term df sumsq meansq statistic p.value
setSize 1 0.2487593 0.2487593 25.016136 0.0000010
type 1 0.0559410 0.0559410 5.625629 0.0183223
model 2 0.2206534 0.1103267 11.094856 0.0000224
setSize:type 1 0.0305314 0.0305314 3.070350 0.0807405
setSize:model 2 0.3386820 0.1693410 17.029545 0.0000001
type:model 2 0.0211612 0.0105806 1.064023 0.3463464
setSize:type:model 2 0.0289131 0.0144566 1.453806 0.2352992
Residuals 304 3.0229615 0.0099440 NA NA
Figure 10

Figure 10

It is difficult to assess what the model fits are capturing without examining the specific paramter sets more carefully or deducing if membership in a particular model group predicts some other cognitve or learning aspects of the subjects.A summary of the parameter data follows.
First, for the cohort of subjects

Parameters

Parameter spread

Parameter summary: what is the spread of the parameters across participants in the models?
Figure 14.

Figure 14.

Figure 14.

Figure 14.

mean and medians of parameter values
variable mean median
alpha 0.1500000 0.1500000
egs 0.3000000 0.3000000
bll 0.5500000 0.5500000
imag 0.2750000 0.2500000
ans 0.3000000 0.3000000
bias 0.3713112 0.3214167

Individual parameter effects on outcomes

Bias in meta-learning model

Some specific plans are to estimate the three LTM parameters for all 83 participants and see if they are related to WM, PSS measures. Also, how are the parameters related to the “separation” between s3 and s6?

Some more specific things to test might be effect of delay between stimulus presentations. ### What are the differences in learning type interms of behavioral outcomes in other tasks?

These plots show group effects for uCLIMB subjects only in python and OLCTS measures and behavioral predictors.

We have 3Back and PSS for a large majority of participants - what are the group differences if any in these outcomes based on model fit?

Chantel’s request: combine language and programming measures and compare groups.

EEG Beta analysis

Individual plots: